Guidance to best tools and practices for systematic reviews

Data continue to accumulate indicating that many systematic reviews are methodologically flawed, biased, redundant, or uninformative. Some improvements have occurred in recent years based on empirical methods research and standardization of appraisal tools; however, many authors do not routinely or consistently apply these updated methods. In addition, guideline developers, peer reviewers, and journal editors often disregard current methodological standards. Although extensively acknowledged and explored in the methodological literature, most clinicians seem unaware of these issues and may automatically accept evidence syntheses (and clinical practice guidelines based on their conclusions) as trustworthy. A plethora of methods and tools are recommended for the development and evaluation of evidence syntheses. It is important to understand what these are intended to do (and cannot do) and how they can be utilized. Our objective is to distill this sprawling information into a format that is understandable and readily accessible to authors, peer reviewers, and editors. In doing so, we aim to promote appreciation and understanding of the demanding science of evidence synthesis among stakeholders. We focus on well-documented deficiencies in key components of evidence syntheses to elucidate the rationale for current standards. The constructs underlying the tools developed to assess reporting, risk of bias, and methodological quality of evidence syntheses are distinguished from those involved in determining overall certainty of a body of evidence. Another important distinction is made between those tools used by authors to develop their syntheses as opposed to those used to ultimately judge their work. Exemplar methods and research practices are described, complemented by novel pragmatic strategies to improve evidence syntheses. The latter include preferred terminology and a scheme to characterize types of research evidence. We organize best practice resources in a Concise Guide that can be widely adopted and adapted for routine implementation by authors and journals. Appropriate, informed use of these is encouraged, but we caution against their superficial application and emphasize their endorsement does not substitute for in-depth methodological training. By highlighting best practices with their rationale, we hope this guidance will inspire further evolution of methods and tools that can advance the field. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-023-08304-x.

Primary studies typically report either quantitative data or qualitative data. Quantitative data are expressed numerically and analyzed statistically; they are collected from experiments and tests, metrics, databases, and surveys. Such data are commonly reported in healthcare research including studies of intervention effectiveness, satisfaction with care, the incidence, prevalence, and etiology of diseases, and the properties of measurement tools. 1 Qualitative data are descriptive (eg, concepts, meanings, words, etc.) rather than numerical and are collected through interviews, observations, and textual analyses. Qualitative research studies in healthcare investigate the impact of illnesses and interventions and explore the experiences, attitudes, beliefs, and perspectives of patients, caregivers, and clinicians. 2 Qualitative systematic reviews synthesize this data using metaaggregation 2 or an interpretative approach (eg, meta-ethnography, critical interpretative synthesis, realist synthesis). 3

Group and single case
These two broadly defined approaches may attempt to establish causal relationships or describe associations. 4 In group research, data collected from groups of individuals are analyzed and allow for testing the effectiveness of treatments at the group level. "Between group" designs are typical of clinical research in medicine. These studies compare participants that have different exposures (eg, control versus experimental) or that differ on some feature (eg, gender, disease risk factor, test measurement or score). 5 Less commonly, studies of groups utilize a "within group" design (also referred to as "within-subjects"). Such studies collect data from groups of participants exposed to the same condition at various times (eg, before/after, or with repeated exposures).
Single case experimental designs are also known as single-subject, N-of-1, or small-n designs. These are also characterized by repeated measurements over time in participants with the same exposures; however, in contrast to group design research, the individual case serves as the unit of analysis. This may be one person or an entity such as a classroom or an organization 6 ; for this reason, we prefer use of "single case experimental design" (SCED) to describe these studies. SCEDs typically involve numerous repeated measurements along with multiple methods for ensuring accuracy and fidelity of the data. 7 Confidence in the validity of the data from individuals or entities may be enhanced through replication with additional participants. 8 SCEDs are standard in psychology and common in education, social work, and communication disorder research but can be encountered in many biomedical specialties.

Randomized and non-randomized designs
We follow the example of Cochrane 9 and others 10 and avoid distinctions between experimental versus observational in favor of randomized or non-randomized. Randomized trials are relatively less variable compared to non-randomized studies. The research question in randomized trials must be specific. It is investigated by comparison of intervention and control groups that should be homogeneous as well as randomly assigned. When possible, blinding of patients, interventionists, and assessors is recommended. Randomized trials are typically used to test hypotheses about new or untested interventions.
In contrast, NRSI represent a number of diverse designs that are commonly classified using ambiguous labels (Table AF2B). Use of these labels by systematic review authors is discouraged by Cochrane 9 Studies that do not randomize subjects provide descriptive information (prevalence and incidence) and/or analyses of associations. Some describe a single cohort with an "exposure" (risk factor or intervention) that allows calculation of an absolute risk of a disease or disease-related outcome.
More commonly, non-randomized studies compare outcomes of cohorts with different exposures that allow calculation of relative effect measures. 12,13